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Abstract 

We have developed systems of two types for NT- 
CIR2. One is an enhenced versión ofthe system we de- 
veloped for NTCIRl and IREX. It submitted retrieval 
results for JJ and CC tasks. A variety of parameters 
were tried with the system. It used such character- 
istics of newspapers as locational information in the 
CC tasks. The system got good results for both of the 
tasks. The other system is a portable system which 
avoids free parameters as much as possible. The sys- 
tem submitted retrieval results for JJ, JE, EE, EJ, and 
CC tasks. The system automatically determined the 
number of top documents and the weight of the origi- 
nal query used in automatic-feedback retrieval. It also 
determined relevant terms quite robustly. For EJ and 
JE tasks, it used document expansión to augment the 
initial queries. It achieved good results, except on the 
CC tasks. 

Keywords: newspaper article, locational informa- 
tion, portable system, flexible system. 



1 Introduction 

We have developed two systems for the second NT- 
CIR Workshop's information retrieval (IR) tasks. 

One is an enhanced versión of the system that was 
used for the first NTCIR Workshop's IR tasks [|[] and 
the IREX Workshop's IR tasks [|]. We cali this Sys- 
tem A. The other is a newly developed system in which 
free parameters are avoided as much as possible. We 
cali this System B.[] 

System A participated in tasks set in Japanese and 
Chinese (JJ and CC). It achieved high average preci- 
sions on both tasks. System B participated in tasks set 
in Japanese, English, and Chinese (JJ, JE, EE, EJ, and 
CC). It achieved high average precisions on the JJ, EE, 
JE, and EJ tasks. 

Although the two systems participated in some of 

' System A was developed mainly by the first author. and System 
B was developed mainly by the second author. 



the same tasks, the details of the system implementa- 
tions are rather different. Thus, we describe the two 
systems separately, focusing on particular tasks; i.e., 
we describe System-A in the context of CC tasks and 
describe System-B in the context of JJ, EE, JE, and EJ 
tasks. 

2 Chinese IR Tasks 

In this section, we describe System A in the con- 
text of CC tasks. System A participated in JJ tasks^ 
and CC tasks, and achieved particularly good results 
on the CC tasks. This reason is that the types of doc- 
uments used in the CC tasks were very different from 
those used in the JJ tasks. While the JJ tasks involved 
retrieval from a datábase of academic conference pa- 
pers, the CC tasks involved retrieval from a datábase 
of newspaper articles. System A^ takes advantage of 
such characteristics of newspapers as the title or the 
first sentence of the body of an article in a newspa- 
per often indicating the article's subject. We thus ex- 
pected System A to be effective on the CC tasks. In 
the following sections, we give a detailed description 
of System A and report on the experimental results of 
System A's application to the CC tasks. 

2.1 Outline of System A 

System A uses Robertson's 2-poisson model 
which is one kind of probabilistic approach. In Robert- 
son's method, each document's score is calculated by 
using the following equation.| The documents that 

^System A particpated in the long-query and short-query JJ 
tasks. The best average precisions of the two tasks in terms of A 
judgement were 0.4082 (CRL20) and 0.3730 (CRL16), and the best 
average R-precisions were 0.4210 (CRL20) and 0.3866 (CRL27). 
These results are also good. Strings in parentheses indícate system 
ids in the NTCIR contest. Examination of System A's performance 
on JJ tasks is the subject of a forthcoming pubhcation. 

^ System A is based on the system we entered in the IREX con- 
test. In the IREX contest, articles in a datábase of newspapers 
datábase were used as the test collection. System A achieved good 
results in the IREX contest, too [|| ^. 

''This equ ation is BMll, which corresponds to BM25 in the case 
off) = 1 iju]]. 



obtain high scores are then output as retrieval re- 
sults. {Score{d, q) below is the score of a document d 
against a query q.) 



Score{d, q) = ^ ^ 



term í 
in q 



tf(d, t) + kt 



tfid,t) y,lo — 

length(d) °^ df(t) 



tfq(q,t) 
tfq{q,t) + kq 

where t indicates a term that appears in a query. 
t) is the frequency of í in a document d, tfq{q, t) 
is the frequency of í in a query q, df{t) is the num- 
ber of the documents in which t appears, N is the to- 
tal number of documents, length{d) is the length of a 
document d, and A is the average length of the docu- 
ments. kf and kg are constants which are set according 
to the results of experiments. 



(1) 



In this equation, we cali 



tf{d,t) 



tf{d, t) + kt 



length{d) 



the 



TF term, (abbr TF{d,t)), logj^ the IDF term, 
(abbr IDF{t)), and ^/{ttHkq *e TF, term (abbr. 
TFg{q,t)). 

In System A, several terms are added to extend this 
equation, and its method is expressed by the foUowing 
equation. 



section, we explain these extended numerical terms in 
detail. 

2.2 Extended numerical terms 

We use the two extended numerical terms Kiocation 
and Kcategory as shown in Eq. (||). In this section, they 
are explained in detail. 

1. Location information (KiocaUon) 

In general, the title or the first sentence of the 
body of a document in a newspaper indicates 
its subject. Therefore, the precisión of informa- 
tion retrieval can be improved by assigning more 
weight to the terms from these two locations. 
This is achieved by Kiocation which adjusts the 
weight on a term the basis of whether or not it ap- 
pears at the beginning of the document. If a term 
is in the title or at the beginning of the body, it is 
given a high weighting. Otherwise, it is given a 
low weighting. Kiocation is cxprcsscd as foUows: 



^location,! 

(when a term í occurs in the title of 
a document d). 



1 ^ ^Location,2 

, (otherwise) 



(length(d) - 2 * P{d, t)) 
length{d) 



(3) 



Score{d.q) = Kcategory{d) { ^ {TF{d,t) X IDF{t) 



term í 
in q 



xTFc 



{q,t) X KiacaUon{d,t) X (^log^^^ 



length(d) 
length(d) + A 



} 



(2) 



The TF, IDF and TF, terms in this equation are 
identical to those in Eq. ([l|). The valué of the term 
, increases with the length of the document. 

length + A ^ 

This term is introduced because if all of the other in- 
formation is exactly the same, the longer document 
is more likely to include content that is a relevant re- 
sponse to the query. Nq is the total number of queries 
and qf{t) is the number of queries in which t occurs. 
Those terms which occur more frequently in queries 
are more likely to be stop words such as "documents" 
and "thing." We decrease the scores of stop words by 



using log 



jf^- Kcategory and Kiocation are extended 
numerical terms that are introduced to improve pre- 
cisión of results. Kcategory uses the category infor- 
mation of the document found in newspapers, such as 
the economic or political pages. Kiocation uses the lo- 
cation of the term within the document. If the term 
is in the title or at the beginning of the body of the 
document, it is given a higher weighting. In the next 



P{d, t) is the location of a term t in the document 
d. When a term appears more than once in a doc- 
ument, its first appearence is used. kiocation. i and 
kiocation,2 are constants which are set according 
to the results of experiments. 



2. Categorical information (K, 



categor' 



y) 



Kcategory uscs category information such as 
whether or not the document appears on the eco- 
nomic or political pages. This operates by apply- 
ing the technique called relevance feedback [pj|]. 
Firstly, we specify the categories which occur in 
the top 15 documents of the first retrieval when 
Kcategory = 1- Thcn, wc increasc the scores of 
documents that are in majority or most-frequent 
categories. For example, the top 15 documents of 
the first retrieval were most often from the eco- 
nomic pages, we increase the scores of a doc- 
uments from economic pages and decrease the 
scores of all documents from other sections of the 
newspaper. Kcategory ís cxpressed as follows; 



I^category{d^ — í-\-kcategory 



{RatioA{d) - RatioB(d)) 



{RatioA(d) + RatioB{d)) 
(4) 

where RatioA is the proportion of the top 100 
documents in a given category on the first re- 
trieval. RatioB is the proportion of that category 



in all the documents. The valué of Kcategory{d) 
is large when RatioA is large (the top 100 doc- 
uments of the first retrieval frequently appear on 
the same pages as a document d.) and RatioB is 
small (few of the documents appear on the same 
pages as d). k^ategory ¡s a constant which is set 
according to the results of experiments. 

2.3 How terms are extracted 

Before being able to use Eq. in Information re- 
trieval, we must extract terms from a query. This sec- 
tion describes how this is done. With regard to term 
extraction, we considered the several methods listed 
below. 

1 . Method of using only the shortest terms 

This is the simplest method. In the method, the 
query sentence is divided into short terms by us- 
ing a morphological analyzer or a similar tool. 
All of the short terms are used in the retrieval pro- 
cess. The method used to divide the query sen- 
tence into short terms is described in Section 2.4. 



amalgamation materialization 



2. Method of using all term patterns 

In the first method the terms are too short. For ex- 
ample, "enterprise" and "amalgamation" would 
be used instead of "enterprise amalgamation."| 
We felt that "enterprise amalgamation" should 
be used along with the two short terms. There- 
fore, we decided to use both short and long terms. 
We cali this the "all term-patterns method." For 
example, when "enterprise amalgamation ma- 
terialization" was input, we used "enterprise", 
"amalgamation", "materialization", "enterprise 
amalgamation", "amalgamation materialization", 
and "enterprise amalgamation materialization" as 
terms for information retrieval. We felt that this 
method would be effective because it makes use 
of all term patterns. We also felt, however, that 
it is inequitable that only the three terms "en- 
terprise," "amalgamation," "materialization," are 
derived from "... enterprise ... amalgamation ... 
materialization ...", while six terms are derived 
from "enterprise amalgamation materialization." 
We examined several methods of normalization 
in preliminary experiments, then decided to di- 
vide the weight of each term by where 
n is the number of successive words. For exam- 
ple, in the case of "enterprise amalgamation ma- 
terialization", n = 3. 



^Although this part of the paper deals only with retrieval from 
Chinese-language texts, and not English, we have used English ex- 
amples for the benefit of this English-lanugae journal's readers. This 
method handles compound nouns and can be applied not only to 
Chínese but also to English. 




enterprise -" amalgamation -* materialization 



enterprise amalgamation 



enterprise amalgamation materialization 



Figure 1. An example of a lattice 
structure 



3. Method using a lattice 

Although the method of using all-term patterns 
effectively uses all patterns of terms, it needs 
to be normalized by using the adhoc equation 



n(n+l) 



. We thus considered a method in which 



all term patterns are stored in a lattice. We used 
the patterns in the path with the highest score on 
Eq. (^. (This method is almost the same as 
Ozawa's [|]]. The differences are the fundamental 
equation for information retrieval, and whether or 
not a morphological analyzer is used.) 

For example, in the case of "enterprise amalga- 
mation materialization" the lattice shown in Fig. 
|l] is obtained. As shown in this figure, the score 
is calculated for each of the four paths by using 
Eq. (^, and the terms in the highest-scoring path 
are used. This method does not require the adhoc 
normalization required by the method of using all 
term patterns. 

4. Method of using down-weighting 

This is the method that Fujita proposed at the 
IREX contest [jl^. It is similar to the all-term 
patterns method. It uses all term patterns but 
the method of normalization is different from that 
used in the all-term patterns method. The weights 
of the shortest terms are kept constant while the 
weights of the longer terms are decreased. We 
decided to apply the weight kdown^~^ to such 
terms, where x is the number of shortest terms 
and k^own was set according to the results of ex- 
periments. 

2.4 The method dividing the query sentence 
into short terms 

We used the foUowing three methods to divide the 
query sentence into short terms .ñ 



* System A only segments sentences of documents are not seg- 
mented except for automatic feedback. 



1 . Using a morphological analyzer 



3. Using both of the above two me thods 



In this method, the query sentence is segmented 
by using the CSeg&Tag 1.0 Chinese-language 
morphological analyzer [|l7||. 

2. Segmentation by using mutual Information 

This method is based on the method [ [l6| ] pro- 
posed by Sproat et al. It calculates the mutual In- 
formation of two adjacent characters and divides 
them when their mutual information. The details 
of our method are as follows. 

Abnost all Chinese words consist of one Chi- 
nese character or two Chinese characters.]] So 
we assumed that all terms consist of one Chi- 
nese character or two Chinese characters. Thus, 
our method firstly divides Chinese sentences into 
fragments which consist of one Chinese charac- 
ter or two Chinese characters by using mutual in- 
formation. This is done by repeatly applying the 
foUowing procedure. 

• Divide up pairs of adjacent characters with 
the lowest amount of mutual information, 
where each pair is part of a fragment which 
consist of more than two Chinese character 

Next, we use the statistics of the Chinese cor- 
pus. In this case, we assume that the ratio of 
one-character words and two-characters words in 
a Chinese text is a:b.^ We take this statistic then 
re-divide those fragments that consist of pairs of 
characters having little mutual information into 
two sepárate one-character words in such a way 
that our process of división produces a text bro- 
ken up into one- and two-character words in the 
approximate proportion a:b. This is done by re- 
peating the following procedure until the text will 
be divided up to produce the approximate propor- 
tion a:b. 

• Divide those fragments consisting of pairs 
of characters having the lowest mutual in- 
formation 

The result of this procedure is equivalent to that 
of the following procedure. 

• Divide up those fragments consisting of 
pairs of characters having a level of mutual 
information which is equal to or lower than 
kcmi, where kcmi is the amount of mutual 
information that will divide up the text to 
produce the approximate proportion a:b. 

^According to the paper [|l6||, the occurrence rate of words which 
consist of three Chinese characters is under 1%. 

'*For example, Spraot stated that this ratio is about 7:3 jlq]. 



This method firstly divides up the Chinese sen- 
tences by using the morphological analyzer and 
then further divides up the fragments by using 
mutual information and the statistics on the Chi- 
nese Corpus. 

2.5 Automatic feedback in System A 

Automatic feedback is also used in System A. In 
System A, an element of automatic feedback is inclued 
via the IDF term of the equation (g). When performing 
automatic feedback, we substitute the following equa- 
tion for the original IDF term. 

IDF{t) = {E{t) + kaf X (Ratio C{t) - Ratio D{t))} 

XlDFo„g{t) (5) 

E{t) = 1 (when a terni t is in a query) 

O (otherwise) (6) 

where Ratio C{t) is the proportion of the top kr doc- 
uments of the first retrieval in which a term t appears. 
Ratio D{t) is the proportion of all of the documents 
in which a term t appears. IDForigit) is the original 
IDF term. This formula is based on Rocchio's formula 
0. k af and kr are constants set according to the re- 
sults of experiments. 

Term expansión is also used in System A. The terms 
'Terms' as defined below are added. 

Terms = {t\P(t) > kp} (7) 

where P{t) is the probability that a term t appears 
in no less than n documents of the top kr documents. 
P{t) is approximately calculated by assuming that the 
appearance of the term t follows a binominal distri- 
bution with a probability of the occurrence rate of the 
term t in all the documents. kp is a constant set ac- 
cording to the results of experiments. 

2.6 Weighting counting in automatic feed- 
back 

We considered that a term which occurs in a doc- 
ument which has a higher rank on the first retrieval is 
more important. So, when counting the frequency of a 
term í in a document d with a rank of Rank{d), Sys- 
tem A applied the following factor AFW{t, d) to the 
frequency. 

AFW{t,d) = [kafn, + 1) - 2 X fc^/,„ ^ (8) 

where ka/w is a constant set according to the results 
of experiments. Equations (||) and are calculated 
by using the frequency calculated by Equation ^. 

2.7 Experiments 

The experimental results of System A are shown 
in Table |l|. "LO", "SO", "VS", and "TI" indicate a 
long-query task, a short-query task, a very short query 



Table 1. Experimental results in CC Tasks 









parameters 


R-Precsision 


Ave. Presision 




Task 


ID 


Term fc,„¿ 


"'iV q 


dw af L C fcr fca/ 


ri í?id 


relax 


ri í?id 


relax 


SI 


LO 


07 


MI 


4.5 





V 


V V V 

y y y 


5 


0.7 


0.5751 


6610 


0.6348 


0.7261 


S2 


LO 




MI 


4.5 





V 

y 


n V V 


5 


0.7 


0.5529 


0.6564 


0.6186 


0.7146 


S3 


LO 




MI 


4.5 







n V V 


5 


0.7 


5660 


0.6572 


0.6183 


0.7118 


S4 


LO 


08 


MI 


4.5 


1 




V V V 

y y y 


5 


0.7 


0.5842 


0.6692 


0.6392 


0.7362 


S5 


LO 


09 


MI 


4.5 


^ 


y 


V V v 

y y y 


5 


0.7 


5803 


0.6651 


6386 


0.7342 


S6 


LO 


02 


MI 


3 





V 


V V V 

y y y 


5 


0.7 


0.5812 


6685 


0.6439 


0.7326 


S7 


LO 


03 


MI 


3 





y 


n V V 


5 


0.7 


0.5632 


0.6699 


0.6329 


0.7231 


S8 


LO 


04 


MI 


3 





j( 


V V V 

y y y 


5 


0.7 


5865 


0.6684 


0.6438 


0.7325 


S9 


LO 


05 


MI 


3 





fi 


n V V 


5 


0.7 


0.5587 


0.6695 


0.6329 


0.7229 


SIO 


LO 


06 


MI 


3 






V V V 

y y y 


5 


0.7 


0.5782 


0.6813 


0.6459 


0.7409 


SU 


LO 


10 


MI 


3 




y 


V V v 

y y y 


5 


0.7 


0.5780 


0.6724 


0.6427 


0.7383 


812 


LO 


19 


MI 


4 


J 


V 


V V V 

y y y 


5 


0.7 


0.5814 


0.6161 


0.6407 


0.7399 


813 


LO 




MI 


4 


J 


y 


V n V 

y *i y 


5 


0.7 


0.5659 


0.6704 


0.6316 


0.7334 


S14 


LO 




MI 


4 


. 


V 

y 


V V n 

y y Ai 


5 


0.7 


0.5916 


0.6945 


0.6567 


0.7488 


815 


LO 




MI 


4 


J 


V 

y 


y n n 


5 


0.7 


0.5778 


0.6822 


6530 


0.7445 


816 


LO 


18 


MI 


4 


J 


V 


V V V 
J J J 


5 


1 


0.5900 


0.6752 


0.6415 


0.7387 


S17 


LO 


20 


MI 


4 


J 


V 

y 


V V V 

y y y 


7 


0.7 


0.5746 


0.6778 


0.6388 


0.7374 


S18 


LO 


21 


MI 


4 


J 


y 


y y y 


10 


0.7 


0.5605 


0.6741 


0.6299 


0.7316 


S19 


LO 


11 


MI 


4 


J 


V 


V V V 
J J J 


15 


0.7 


0.5743 


0.6776 


0.6265 


0.7291 


820 


LO 


12 


MI 


4 


J 


V 

y 


V V V 

y y y 


20 


0.7 


0.5577 


0.6767 


0.6254 


0.7268 


821 


LO 


13 


MI 


4 


J 


V 


y s s 


5 


0.7 


0.5709 


0.6703 


0.6203 


0.7271 


822 


LO 


14 


T+M 


4 


J 


V 


V V V 

j j j 


5 


0.7 


0.5924 


0.6810 


0.6486 


0.7413 


823 


LO 




TAG 


4 


J 


V 

y 


V V V 

y y y 


5 


0.7 


0.5936 


0.6803 


0.6501 


0.7419 


824 


LO 


15 


T+M 


4 


J 


y 


n y y 


5 


0.7 


0.5820 


0.6778 


0.6388 


0.7290 


825 


LO 


17 


T+M 


4 


1 


n 


V V V 

J J J 


5 


0.7 


0.5712 


0.6739 


0.6341 


0.7276 


826 


LO 


16 


T+M 


4 


J 


n 


n y y 


5 


0.7 


0.5557 


0.6628 


0.6165 


0.7145 


827 


SO 


02 


MI 


4 


1 


y 


y y y 


5 


0.7 


0.5831 


0.6817 


0.6340 


0.7368 


828 


SO 


03 


T+M 


4 




y 


y y y 


5 


0.7 


0.5974 


0.6766 


0.6529 


0.7376 


829 


VS 


02 


MI 


4 




y 


y y y 


5 


0.7 


0.5990 


0.6788 


0.6516 


0.7387 


830 


VS 


03 


T+M 


4 




y 


y y y 


5 


0.7 


0.6089 


0.6749 


0.6596 


0.7397 


831 


VS 




T+M 


4 




y 


y n y 


5 


0.7 


0.5893 


0.6669 


0.6468 


0.7282 


832 


VS 




T+M 


4 




y 


y y n 


5 


0.7 


0.6027 


0.6781 


0.6722 


0.7454 


833 


VS 




T+M 


4 




y 


y n n 


5 


0.7 


0.5889 


0.6636 


0.6563 


0.7350 


834 


VS 




TAG 


4 




y 


y y y 


5 


0.7 


0.6086 


0.6757 


0.6604 


0.7399 


835 


TI 


02 


MI 


4 




y 


y y y 


5 


0.7 


0.4683 


0.5923 


0.4813 


0.6239 


836 


TI 


03 


T+M 


4 




y 


y y y 


5 


0.7 


0.4651 


0.5770 


0.4793 


0.6118 



The number of queries is 50. The number of documents is 132,173. 



task, and a title-query task. The column "ID" indi- 
cates the system id in the NTCIR 2 contest. "-" in 
"ID" indicates a system which was not submitted for 
the formal run of the NTCIR 2 contest. The col- 
umn "Term" indicates the method used to divide the 
query sentence up into short terms. "TAG", "MI", 
and "T+M" respectively indicate the use of the Chi- 
nese morphological analyzer, mutual information, and 
both the morphological analyzer and mutual informa- 
tion. fccmi,0 kNq, kr, and kaf are set as in Table |l]. 
"dw", "af", "L" and "C" indicate the down-weighting 
method, automatic feedback method, locational infor- 
mation, and categorical information. "y" in a col- 
umn indicates the use of the method, and "n" indi- 



cates that the method was not used. When we do not 
use the down-weighting method, we use the shortest- 
terms method as the method of extracting terms. The 
other parameters are set as foUows: kiocation.i — 1-2, 

klocation,2 = 0.1, kcategory — fct = í, kq — OO, 



kn 



0.9, and ka 



fw 



0.5. "s" in "L" and "C" 



means the strong setting where kiocation,i = 1-3, 

klocation,2 — 0.15, kcategory — 0.15. t in kj^q 

means using log-^j^ in a more complex way such that 
qf{t) means the number of queries whose titles con- 
tain a term t. 

The following were the findings produced by the 
experimental results. 



'in the CHIR newspapers datábase, using fccmi = 5.33, 4.96, 
4.56, 4.10. and 3.53 divides up the text to produce the approximate 
proportions of 7:3, 6.5:3.5, 6:4, 5.5:4.5, and 5:5. 



'"Our previous worlc |^| had confirmed that the use of all term 
pattems is not a good method, and that even the simple method of 
using only the shortest terms can achieve good results. 



• The precisions of "T+M" or "TAG" are slightly 
higher than that of "MI." We thus found that us- 
ing the morphological analyzer produced better 
results than using mutual information. 

• By comparing S12 with S13 or S30 with S31, we 
found that locational information achieved an im- 
provement of about 0.02 or 0.03. We can see that 
locational information is very effective. 

• By comparing S12 with S14 or S30 with S32, we 
found that the precisions when categorical infor- 
mation not used were higher than the precisions 
when it was used. So, at least for these data, using 
category information was not a good thing. 

• The automatic feedback method was always ef- 
fective. 

• The down-weighting method sometimes pro- 
duced better results and sometimes produced 
poorer results. 

2.8 Summary 

System A uses such characteristics of newspapers 
as locational information and obtained good results 
in the ce Tasks. By performing comparative exper- 
iments, we confirmed that locational information was 
effective. The other kinds of information were, how- 
ever, not so effective. 

System A has many parameters and many methods. 
In the future, we would like to conduct much more 
extensive experiments in order to examine the effects 
of parameters and methods in System A. 

3 Japanese and English IR Tasks 

3.1 Overview of the results 

The average precisions for System-B against rele- 
vant documents on JJ, EJ, EE, and JE tasks are pre- 
sented in Table |[ In Table |[ 'very short' means that 
the system used the 'TITLE' part of the queries for 
retrieval, 'short' means that it used the 'DESCRIP- 
TION' part of the queries, and 'long' means that it 
used all parts of the queries except the 'FIELD' part. 
For each task, 'feedback' means the precisions that 
were obtained by automatic-feedback retrieval, while 
'initial' means the precisions that were obtained by us- 
ing the raw initial queries. The symbol '*' means that 
the corresponding search results from System-B were 
submitted to the NTCIR 2 workshop committee as for- 
mal runs.p^ For the JJ and EE tasks, only 'feedback' 
results from System-B were submitted, while for the 
EJ and JE tasks, both 'initial' and 'feedback' results 



' ' On JJ short, System-A outperformed System-B. Its best average 
precisión was 0.3730 



were submitted. These average precisions place the 
system in the highest-scoring group among those for 
which results were submitted. 

We describe System-B in detail below. We start 
by describing the scoring function used to rank docu- 
ments. Next, we describe the design issues involved 
in selecting possible free parameters and then com- 
pare results for various parameter valúes through ex- 
perimented results. Finally, we conclude this section 
with a brief summary. 

3.2 Scoring function 

Our scoring function is based on BMll |^]. Let 
D he a document and Q be a query, where D and Q 
have been tokenized into words. D and Q are bags of 
words. We define |X| as the number of words in X 
and define tf{w\X) as the number of a word w in X. 
We also define W{X) as the set of different words in 
X. 

The score of D given Q, score{D\Q), is defined as: 

score{D\Q) = ^ d{w\D)q{w\Q), (9) 
weW{D)nW{Q) 

where d{w\D) is the weight of w given D and q{w\Q) 
is the weight of w given Q. d{w\D) is defined as: 



d{w\D) 



tf{w\D) 



tfiw\D) + \D\/A' 



(10) 



where A is the average of jZ?! over the document col- 
lection V that contains _D,i.e., 



(11) 



Dev 



where I©! is the number of documents in V. q{w\Q) 
is defined as: 



qiw\Q) 



{k, + l)tf{w\Q) 
kq + tf{w\Q) 



tdfiw), (12) 



where kg ~ 1000 and 



idf{w) = log 



\V\ 



\v[w)y 



(13) 



where \'T){w) \ is the number of documents that contain 
w. T>{'w) is , of course, a subset of T). 

score{D\Q) is used for the initial search. For an 
automatic feedback search, we use Score{D\Q): 



Score{D\Q) = ^ d{w\D)q' {w\Q), 
wew{D)nw(Q') 



(14) 



where 



,'HO) = a.HQ) + ^-i^^^^, (15) 



Table 2. Average Precisión (Relevant). 





very short 


short 


long 


JJ 


initial 


0.2112 


0.3082 


0.3807 


feedback 


0.2706* 


0.3396* 


0.4303* 


EJ 


initial 




0.2497* 


0.3156* 


feedback 




0.2564* 


0.3260* 


EE 


initial 


0.2192 


0.2714 


0.3684 


feedback 


0.2523* 


0.3131* 


0.4043* 


JE 


initial 




0.3409* 


0.3855* 


feedback 




0.3413* 


0.3856* 



' represents submitted runs. 



where a is a number , Di is the top i-Úi document 
retrieved by initial search, R is the number of top- 
scoring documents used in the automatic-feedback 
search, and F is the function used to select appropriate 
terms from a document. Q' in Equation (|l^ is defined 
as: 

Q' ^QUF{Di)U---(JFiDii). (16) 

3.3 Design Issues 

The free parameters we consider in this paper are a, 
F, and R in Equation (|T^). We tried to have these pa- 
rameters defined automatically. Before, however, we 
describe our attempts at determining these parameters, 
we will discuss how we preprocessed documents and 
queries for the JJ, EE, JE, and EJ tasks.[^ 

3.3.1 Tokenization 

Tokenization is, to a large degree, language dependent. 

We tokenized Japanese texts (documents or queries) 
by using ChaSen versión 2.02[^ and then extracted 
lemmas of content words as DorQ. We postprocessed 
the output of ChaSen to elimínate some erroneous pat- 
terns of tokenization. 

In a similar way, we used LimaTK|^ to morpholog- 
ically analyze English texts and then used a stemmer 
that built around a library available in the WordNetl.6 
package|^ to lemmatize content words. Stop words 
were removed according to the list in the Nice stem- 
mer package.0 

The documents and queries thus processed were 
used for the JJ and EE tasks. 



The method used to preprocess documents and queries for CC 
tasks is similar to, but more primitive than, a metliod described in 
sectionH. We, thus, omit a description here. 

'^http : / / chas en .aist-nara.ac. jp/ 
^''http ://cl.aist-nara.ac. jp / "tatuo-y /ma/ 
^^http : / / www .cogsci.princeton. edu/ ~wn/ 
^^http : / / www . ils . une . edu/iris /irisnstem . htm 



3.3.2 Query translation 

For the JE and EJ tasks, we translated queries. Once 
we transíate queries, cross-lingual IR (CLIR, i.e., JE 
or EJ) is performed by the same method as used for 
mono-lingual IR (JJ or EE). We describe the method 
below as applied to the translation of a Japanese query 
into English. English to Japanese translation is per- 
formed in a similar way. 

We perform document expansión [ [l5| ] to augment 
the original queries; i.e., for a Japanese query, we first 
search the Japanese datábase to get documents that are 
relevant to the query. Next, we extract the words con- 
tained in the top-5 documents and combine them to the 
original query. We thus obtain an expanded Japanese 
query.[^ 

The expanded Japanese query is then translated into 
English. For the translation, we first made a Japanese- 
to-English bilingual dictionary from the Japanese- 
English abstract pairs provided for the first NTCIR 
Workshop. From those pairs, we extracted Japanese- 
English keyword pairs contained in the abstract pairs. 
It was possible for these keywords to be phrases or 
words. If a Japanese keyword co-occurred with múlti- 
ple English keywords, then we selected the most fre- 
quently co-occurring English keyword as the transla- 
tion of the Japanese keyword [^. Texts were trans- 
lated in the foUowing two steps; we used ChaSen to 
morphologically analyze the text, then translated the 
sequence of morphemes into English. The translation 
was on a word-to-word or phrase-to-phrase basis. Dis- 
ambiguation by contexts was not used. The translation 
was based on longest matches. For example, if a query 
'a b c' is given, where 'a' is translated into 'A' and 'a 
b c' is translated into 'D E', then 'a b c' is translated 
into 'DE'.[] 

'^Local context analysis has been used to expand queries in CLIR 
The comparison is a future work. 

"*|^ also used a longest-match algoiithm, but they did not use 
a morphological analyzer, which might degrade the system perfor- 
mance. This belief is supported by Table N which shows the per- 
formance of pur method in no document expansión. The average 
precisión of |H| on the same task was 0.3216, while that of our ap- 
proach is 0.3364. 



Translated queries were used for the JE and EJ 
tasks. The retrieval algorithm was the same as that 
used for the JJ and EE tasks. 

As is shown in Table ^ our approach to the JE and 
EJ tasks worked quite well. It is evident, however, 
that the degree of success of our approach depends on 
the degree of similarity between the Japanese datábase 
and the EngHsh datábase used for CLIR. We thus con- 
ducted another experiment which used the databases 
and JE-queries provided for the first NTCIR Work- 
shop. The type of query used for the experiment was 
Tong' except that we did not use EngHsh concepts. 



Table 3. Average precisions with docu- 
ment expansión. 



Source 


Target 


Average precisión 


<P 


ntcl-e 


0.3364 


ntc2-j 


ntcl-e 


0.3628 


ntcl-j 


ntcl-e 


0.3899 



In Table the column 'Source' lists the databases 
used to expand the original queries. indicates no 
document expansión. 'ntc2-j' means that the Japanese 
datábase which was freshly added for the second NT- 
CIR Workshop was used for document expansión, and 
'ntcl-j' means that the Japanese datábase provided for 
the NTCIR workshop 1 was used for document expan- 
sión, 'ntcl-e', which is listed in 'Target' column for 
all entries, is the English datábase that was the target 
of the searches for documents. Average precisión was 
evaluated against relevant documents in 'ntc 1 -e' . 

'ntcl-j' and 'ntcl-e' are nearly parallel. Naturally, 
it achieved the best performance of these three cases. 
'ntc2-j' and 'ntcl-e' are comparable. The average pre- 
cisión is still better than with no document expansión. 
Document expansión is thus worthwhile for CLIR. 

We have briefly described the language-dependent 
parts of System-B. Next, we describe its language- 
independent parts, describing F, R, and a in Equation 
([l|), in that order. 

3.3.3 Deflnition of F 

We define a relevance of word w for the top-scoring R 
documents in terms of probability.[^ 

Given a bag of words X, then the probability of w, 
Pi-{w\X), and its variance Var(w|X) are estimated as 



Pt{w\X 



tf{w\X) + 1 



Var(u;|X) 



\X\ + 2 ' 
Pt{w\X) * {í~Pr{w\X)) 



(17) 



(18) 



\X\ + 3 

We then define D]^ as the bag of words that contains 
all the words in Di , D2, ■ ■ ■ ,Dfí and define D]^ as the 

"[|lO|| also uses a probabilistic metric to select relevant terms. 



complement of Z)}¿ with a universal set that is defined 
by all of the words in the document collection T>. 
The relevance of word w, rel{w\D}^), is defined as 

relHD],) ^ ^±m=Ilt^. (19) 



'Var(u;|i:>]^) + Var(w|i:)]^) 
Finally we define F{Di) as 

= {w\rel{w\D\) 6» A G A}, (20) 

where 6* is a predefined threshold. 

rel(w\D\i) approximately follows the standard nor- 
mal distribution. Possible candidates for 9 are 1.28, 
1.65 and 2.33, which correspond to significance levéis 
of 0.10, 0.05 and 0.01, respectively. Hereafter, signifi- 
cance levéis are represented by p. 

We used p = 0.10(6» = 1.28) for all of the sub- 
mitted runs.0 This cholee was based on previous ex- 
periments conducted on the datábase provided for the 
first NTCIR Workshop. p = 0.10 is a robust parameter 
valué for term selection as is shown in section 3.4. 



3.3.4 Deflnition of R 

We used the method explained below to set R auto- 
matically. We found, however, that the method was 



not efficient, and this is shown in section 3.4. 

Our method is based on the degree of increase in the 
number of different words in top-scoring documents. 
If the content of successive documents is similar, the 
documents should share keywords. This degree of in- 
crease is thus low when similar documents continué. 
Our algorithm is depicted in Figure ^ In the experi- 
ments described in section 3.4, the average numbers of 
documents selected by the algorithm were 3.81, 4.01, 
and 3.95, for 'very short', 'short', and 'long' queries, 
respectively. 



As is shown in section 3.4, the performance of IR 



is quite sensitive to R. We will therefore investígate 
methods for the automatic-determination of R in fu- 
ture work, though our initial attempts have not been 
too successful. 

3.3.5 Deflnition of a 

a is defined heuristically as follows: 

a=\W{F{D\.))\WTm ^ (21) 

where 

F{D]i) - F{Di) U F{D2) U • • • U F{Dr). (22) 

a > 1 holds because |M^(F(Í?]^))| > 1. a approaches 
1 when |W^((5)| is large. a takes a large valué when 
the number of different words in Q is small and the 

-"We used a more precise valué for 6 actually. 



for (R=3; ; R++) { 

if (dif f (R) > dif f (R-1) ) { 
break 

} 



int diff(i) { 

return \W{F{D}))\ - \W{F{DU))\ 

} 



Figure 2. Algorithm for determining R. 



number of different words in D\¡ is large. a is defined 
so that Q is more important than D]^. In the experi- 



ments described in section 3.4, the average valué of a 
were 13.44, 3.89, and 1.14, for 'very short', 'short', 
and 'long' queries, respectively. 

This heuristic a ppro ach worked reasonably well as 
is shown in section 'iA . 

In summary, for the formal runs, we used p — 0.10 
for term selection and used automatic methods to set 
R and a. p — 0.10 was the only parameter that we had 
to set by hand. 

3.4 Comparison of parameter valúes 

We varied the valúes of p, R, and a to observe the 
effects of parameter valúes on performance. Perfor- 
mance was measured by the average precisión against 
relevant documents. We used the queries and docu- 
ments provided for the second NTCIR workshop. Ex- 
periments were conducted on JJ and EE tasks. We only 
report on the results for JJ tasks, here, because both 
sets of results displayed the same tendency. 

The parameter valúes for p were 



p = 0.10,0.05,0.01. 
The parameter valúes for R were 

R= 1,3,5,7,10,15. 
The parameter valúes for a were 

a = 0.5,1.0,1.5. 



(23) 



(24) 



(25) 



For R and a, we also tried the heuristic methods de- 
scribed in Figure ^ and Equation (|lj). We tried all 
combinations of these parameter valúes. Thus, we 
conducted 3 x 7 x 4 = 84 runs to make our com- 
parison for each of the 'long', 'short', and 'very short' 
queries. 

To evalúate the effectiveness of a parameter, we 
fixed its valué and then calculated the average of the 
average precisions of the 84 runs. The results are 
shown in Figures m and H In these fi gures, hori- 
zontal axes represent the query types and vertical axes 



represent the average precisions. The title of each line 
indicates the parameter valué, 'var' means that valúes 
are determinbe by our methods proposed above. 'ini- 
tial' means the results for the initial search. The titles 
are in order of decreasing average precisión for short 
queries. 

Figure ^ shows the results for various settings of p. 
Note that p = 0.1 and p — 0.05 performed equally 
well. This suggests that the valué of p is robust over 
this rangeFj 

Figure ^hows the results for various settings of R. 
It is difficult to detect any clear tendency in Figure ^ 
but it seems that when queries are long, small R val- 
úes perform well, and when queries are short, large R 
performs well. This suggests that the length of queries 
could be used to set R automatically. 

Figure H shows the results for various settings of a. 
The average of a were 13.44, 3.89, and 1.14 for 'very 
short', 'short', and 'long' queries, respectively. a 
takes large valúes for 'very short' and 'short' queries. 
It takes small valúes for 'long' queries. a worked 
reasonably well. This is because for 'very short' and 
'short' queries, the results of the initial search are not 
very reliable, so we had to weight Q heavily, while 
for 'long' queries, the results of the initial search are 
reliable, so we don't have to weight Q so heavily. 

3.5 Summary 

System-B was designed as a portable IR system that 
avoids free parameters as much as possible. It will 
be possible to improve the sysmtem's performance by 
providing a proper method for determining the num- 
ber of top-ranked documents to be used in automatic- 
feedback. 

4 Conclusión 

We have developed two systems for the second NT- 
CIR Workshop IR tasks. One was an improved versión 
of the system that was used for the first NTCIR Work- 
shop IR tasks and the IREX Workshop IR tasks. The 
other was a freshly developed system that avoids free 
parameters as much as possible. The former system 
participated in the JJ and CC tasks and the latter sys- 
tem participated in the JJ, EE, JE, EJ and CC tasks. 
Both systems achieved good results. We have not yet 
compared the two systems thoroughly. In the future, 
we will conduct a more detailed examination of our 
systems and will determine what kinds of Information 
are effective. 
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Additional experiments showed that average precisions for p = 
0.9 to 0.05 performs equally well. (The results of p = 0.1 were 
slightly better than those of other valúes.) 
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